Correlation of Country, Government, Index & GDP Per Capita

Author

Brian Kwong, Connor Gamba, Joey Bailitz, Arneh Begi

Import the Needed Libraries

Please make sure you have the following libraries installed:

  1. Tidyverse
  2. Here
  3. Janitor
  4. Styler
  5. Reactablefmtr
  6. Gganimate
  7. Gifski
  8. formattable
Code
# Set output scope
options(scipen = 0)

# Clears environment
remove(list = as.vector(ls()))

#| output: False
library(tidyverse)
library(janitor)
library(styler)
library(reactablefmtr)
library(gganimate)
library(gifski)
library(formattable)

# Formats documents:
style_file(path = here::here("Src", "Correlation_of_Country_Goverment_Index_&_GDP_Captia.qmd"), style = tidyverse_style, strict = TRUE)

Introduction

Background

The following information contains data collected on the GDP of all countries and an independent international calculation of an absence of corruption index from a variety of factors including government transparency, public official use of state dollars for personal gain and general mistrust in government officials being honest in their actions between the years 1975 and 2021. Higher absence corruption index signals lower perceived corruption.

Our Data

Importing the Data

Code
corruption_data <- read_csv(here::here("Datasets", "abscorrup_idea.csv"), show_col_types = FALSE) |> janitor::clean_names()

gdp_per_cap_data <- read_csv(here::here("Datasets", "gdp_pcap.csv"), show_col_types = FALSE) |> janitor::clean_names()

Matching Our Daaset

Since our corruption only has data for years between [1975,2021] we filter our GDP per capita to those years too

Code
gdp_per_cap_data <- gdp_per_cap_data |> select(country, c(x1975:x2021))

Piviot Data to Long Format

Code
corruption_data <- corruption_data |>
  pivot_longer(cols = c(x1975:x2021), names_to = "Year", values_to = "Absence_Corruption_Index") |>
  mutate(Absence_Corruption_Index = as.numeric(Absence_Corruption_Index))

gdp_per_cap_data <- gdp_per_cap_data |>
  pivot_longer(cols = c(x1975:x2021), names_to = "Year", values_to = "GDP_Per_Capita") |>
  mutate(GDP_Per_Capita = if_else(str_detect(GDP_Per_Capita, "k$"), (as.numeric(str_remove_all(GDP_Per_Capita, "k$")) * 10000), as.numeric(GDP_Per_Capita)))

Data Joins

Code
full_data <- corruption_data |> inner_join(gdp_per_cap_data, by = c("country", "Year"))

full_data <- full_data |>
  mutate(country = as_factor(country)) |>
  mutate(region = as.factor(case_when(
    country %in% c(
      "Antigua and Barbuda", "Argentina", "Bahamas", "Barbados",
      "Belize", "Bolivia", "Brazil", "Canada", "Chile", "Colombia",
      "Costa Rica", "Cuba", "Dominica", "Dominican Republic",
      "Ecuador", "El Salvador", "Grenada", "Guatemala", "Guyana",
      "Haiti", "Honduras", "Jamaica", "Mexico", "Nicaragua", "Panama",
      "Paraguay", "Peru", "St. Kitts and Nevis", "St. Lucia",
      "St. Vincent and the Grenadines", "Suriname",
      "Trinidad and Tobago", "USA", "Uruguay",
      "Venezuela"
    ) ~ "The Americas",
    country %in% c(
      "Albania", "Andorra", "Armenia", "Austria", "Azerbaijan",
      "Belarus", "Belgium", "Bosnia and Herzegovina", "Bulgaria",
      "Croatia", "Cyprus", "Czech Republic", "Denmark", "Estonia",
      "Finland", "France", "Georgia", "Germany", "Greece", "Holy See",
      "Hungary", "Iceland", "Ireland", "Italy", "Latvia",
      "Liechtenstein", "Lithuania", "Luxembourg", "North Macedonia",
      "Malta", "Moldova", "Monaco", "Montenegro", "Netherlands",
      "Norway", "Poland", "Portugal", "Romania", "Russia", "San Marino",
      "Serbia", "Slovak Republic", "Slovenia", "Spain", "Sweden",
      "Switzerland", "Turkey", "Ukraine",
      "UK"
    ) ~ "Europe",
    country %in% c(
      "Algeria", "Angola", "Benin", "Botswana", "Burkina Faso",
      "Burundi", "Cameroon", "Cape Verde", "Central African Republic",
      "Chad", "Comoros", "Congo, Dem. Rep.", "Congo, Rep.",
      "Cote d'Ivoire", "Djibouti", "Egypt", "Equatorial Guinea",
      "Eritrea", "Ethiopia", "Gabon", "Gambia", "Ghana", "Guinea",
      "Guinea-Bissau", "Kenya", "Lesotho", "Liberia", "Libya",
      "Madagascar", "Malawi", "Mali", "Mauritania", "Mauritius",
      "Morocco", "Mozambique", "Namibia", "Niger", "Nigeria", "Rwanda",
      "Sao Tome and Principe", "Senegal", "Seychelles",
      "Sierra Leone", "Somalia", "South Africa", "Sudan", "Eswatini",
      "Tanzania", "Togo", "Tunisia", "Uganda", "Zambia", "Zimbabwe",
      "South Sudan"
    ) ~ "Africa",
    country %in% c(
      "Afghanistan", "Australia", "Bahrain", "Bangladesh", "Bhutan",
      "Brunei", "Cambodia", "China", "Fiji", "Hong Kong, China",
      "India", "Indonesia", "Iran", "Iraq", "Israel", "Japan", "Jordan",
      "Kazakhstan", "Kiribati", "North Korea", "South Korea", "Kuwait",
      "Kyrgyz Republic", "Lao", "Lebanon", "Malaysia", "Maldives",
      "Marshall Islands", "Micronesia, Fed. Sts.", "Mongolia",
      "Myanmar", "Nauru", "Nepal", "New Zealand", "Oman", "Pakistan",
      "Palau", "Papua New Guinea", "Philippines", "Qatar", "Samoa",
      "Saudi Arabia", "Singapore", "Solomon Islands", "Sri Lanka",
      "Syria", "Taiwan", "Tajikistan", "Thailand", "Timor-Leste",
      "Tonga", "Turkmenistan", "Tuvalu", "UAE",
      "Uzbekistan", "Vanuatu", "Palestine", "Vietnam",
      "Yemen"
    ) ~ "Asia-Pacific"
  )))

Understanding Our Data

The following ER diagram represents how our data is structured

Code
# ER Diagram
knitr::include_graphics(here::here("Figures", "ERDiagram.png"))

Variable Type Description
🔑Country String Full Name of a Country
🔑Year Int Year This data was collected
Country’s Absence of Corruption Index Optional(Double) A index measuring how corrupt a perceived country’s government is. Lower value means less trust and more corruption. NA means no data is unavailable for that year. Please check the section about NA values for some possible reasons
GDP Per Captia Double Calculated GDP per Captia (GDP/Population) in USD

The two quantitative variables our team have decided to focus on are GDP per Capita (GDP/Population) and Absence of Corruption as an indexed range. These variables are measured for all 172 countries. Below are the descriptions of each and what they represent, specifically in our data set: GDP per Capita (GDP/Population): This variable represents a countries GDP or Gross Domestic Product for a given year divided by the countries population size. In other words, this variable is simply the per-person GDP for that (Ingabire 2020) country.Absence of Corruption Index: This variable represents the lack of corruption within a countries public administration.(Gapminder 2023) This indicator is determined by assessing public sector corrupt exchanges, public sector theft, executive embezzlement and theft, executive bribery and corrupt exchanges. These values are determined by assessing the extent that public officials within an office administration do not use their power for personal benefit or gain. The scaling of this index is such that a lower value represent a higher absence of corruption, or simply put less corruption. Conversely, a high value for this variable represents a lower absence of corruption, or simply put, more corruption. The categorical variable Country refers to the country that is being observed or referred to when assessing the corresponding values for GDP per Capita and Absence of Corruption. The quantitative variable Year, refers to the given year that corresponds to the other variables observations. Year refers to data collected from January to December for the given year.

Summary

Code
full_data |>
  select(country) |>
  summarise(Count = n_distinct(country))

There are a total of 172 countries in this study

Below is a quick summary of the average mean absence corruption index and GDP per Capita out of every country for every year

Code
full_data |>
  group_by(country) |>
  mutate(across(.cols = c(Absence_Corruption_Index, GDP_Per_Capita), .fns = ~ mean(.x, na.rm = TRUE))) |>
  group_by(Year) |>
  summary() |>
  unclass() |>
  as_tibble() |>
  select(Absence_Corruption_Index, GDP_Per_Capita) |>
  drop_na() |>
  rename("Absence Corruption Index" = Absence_Corruption_Index, " GDP Per Capita (USD)" = GDP_Per_Capita) |>
  mutate_all(.funs = ~ prettyNum(.x, ",")) |>
  reactable() |>
  add_title("Summary of 
Absence Corruption Index & GDP Per Capita Data", font_size = 20, align = "center")

Summary of Absence Corruption Index & GDP Per Capita Data

About those Pesky NAs

One thing we do want to acknowledge is the presence of NAs in our datasheet, particularly in the corruption index. There could be a multitude of reasons why this that data isn’t present for a country in a particular year. Some examples are that the data was lost, data was measured or calculated incorrectly, some disaster prevented the data from being collected, there is an insufficient stability for determination to occur or that the government doesn’t want the public to know about their corruption (therefore attempting to censor it). All these factors, some randomized and some not, will be taken into our consideration when determining the relationship between corruption index and GDP per capita.

Possible Hypothesis

  • Explanatory Variable : Absence of Corruption of Index
  • Response Variable : GDP Per Capita

We believe that a higher absence of corruption index will lead to a higher GDP per capita. Traditionally, our more stable and democratic forms of governments have higher standards of living, less internal and or external conflicts, equal opportunities for all individuals, and large amounts of well fares for their constituency. Furthermore, in general, they have stronger economies that are more reliant to wild economic swings. For instance much of modern Europe has a high GDP per capita because of its modernization, democratic governments and tight integration with the EU.

Data Visualizations

Code
# Renames the year from xYYYY -> YYYY
full_data <- full_data |> mutate(Year = as.numeric(str_remove_all(Year, "^x")))

History of Absence Of Corruption vs GDP per Capita

Code
full_data |>
  # Drops NA values cant be plotted...
  drop_na() |>
  ggplot(aes(x = Absence_Corruption_Index, y = GDP_Per_Capita, color = country)) +
  # Plots each country's data
  geom_point(alpha = 0.7, show.legend = FALSE) +
  # Drops the colour guide wouldn't be helpful for 172 countries...
  guides(color = "none") +
  # Seperates by region
  facet_wrap(~region) +
  # Makes a regression line for easier viewing
  geom_smooth(mapping = aes(x = Absence_Corruption_Index, y = GDP_Per_Capita, color = "blue"), method = "lm", se = FALSE) +
  # Tittle stuff
  labs(title = "Absence of Corruption vs GDP per Capita", subtitle = "Year: {frame_time}", x = "Absence of Corruption", y = "GDP Per Capita (USD)") +
  theme(axis.title.y = element_text(vjust = 0.5, angle = 0), plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = comma) +
  # Animating by year
  transition_time(as.integer(Year)) +
  ease_aes("linear")

In this animation, we can see the observations and fit line for each year separated by region, with each color representing one country across time. Starting in Europe, we can see a steady rise in GDP per Capita across countries with higher Absence of Corruption over the years, leading to a steeper, or more positive, relationship in recent years. Europe is also home Monaco, a country with a GDP per Capita of over 2 million, and the only country that is not visible across the graphs. The Americas produces an interesting time-lapse because most countries don’t change significantly in GDP per Capita, they slowly become less corrupt, which makes the relationship more positive. The two high outliers in this region are the USA in pink and Canada in yellow, and while they both gradually rise in GDP per Capita, the USA experiences a drop in Absence in Corruption in the late 2010s.

The African plot is the only plot to display a negative relationship at any point, which is likely because the vast majority of countries in this region have extremely low GDP per Capita, so a country with any value on the x-axis can influence the line with a spike in GDP per Capita. In particular, the countries Gabon and Libya had the highest GDP per Capita in the 1980s, despite having Absence of Corruption values in the 20’s, so they heavily increased the trend and turned it negative. But the trend in Africa eventually becomes positive. Finally, the Asia-Pacific region has features similar to all 3 other plots, with many countries close to the x-axis as well as countries with high Absence of Corruption values increasing in GDP per Capita over time. The most interesting feature here is three countries with mediocre values in Absence of Corruption having high GDP per Capita near the beginning, and they fluctuate intensely across the years. These three countries are Qatar, the UAE, and Kuwait. While this happens, Singapore, a country with high Absence of Corruption, rises to the top of the continent, making the trend more positive.

With this animation, we have strong reason to believe our original hypothesis. So now we will test the correlation with regression.

2021 Absence Of Corruption vs GDP per Capita

Below is a plot of countries and their respective Absence of Corruption Index vs. GDP Per Capita in the year 2021. At this point, we note that all of the plots from this point use 2021 as the year. This particular year is our basis, since it is has no NA values and is balanced. Furthermore, it is a recent year, fairly representative of the current times.

Code
full_data |>
  filter(Year == 2021) |>
  ggplot(mapping = aes(x = Absence_Corruption_Index, y = GDP_Per_Capita)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Absence of Corruption vs GDP per Capita 2021", x = "Absence of Corruption", y = "GDP per Capita (USD)") +
  theme(axis.title.y = element_text(vjust = 0.5, angle = 0), plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = comma)

The scatterplot and fit line definitely indicate that there is a positive relationship between our variables. Aside from four high outliers, each data point lies relatively close to the fit line. The part that stands out the most is the amount of observations that are bunched up on the x-axis at 0. Many countries in our dataset only have a GDP per Capita in the hundreds or low thousands, so due to how the plot is scaled, they all appear on the same horizontal line. Because of this, we will try a log transformation so these observations are more visible.

Log Transformed 2021 Absence Of Corruption vs GDP per Capita

Code
# With Log Transformation
full_data |>
  filter(Year == 2021) |>
  mutate(GDP_Per_Capita = log(GDP_Per_Capita)) |>
  ggplot(mapping = aes(x = Absence_Corruption_Index, y = GDP_Per_Capita)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Absence of Corruption vs GDP per Capita(Log Transformed)\n 2021", x = "Absence of Corruption", y = "Log of GDP per Capita (USD)") +
  theme(axis.title.y = element_text(vjust = 0.5, angle = 0), plot.title = element_text(hjust = 0.5))

The log transformation succeeded in its goal of breaking up the lower GDP per Capita observations. However, it is likely that the strength of our model was decreased. It seems that after the log transformation, the observations broke into two “pools” separated by GDP per Capita. So while there are no outliers on this axis, the vast majority of observations in the lower pool fall below the regression line, and in the higher pool, most observations are above it. To see how much an increase in score of absence of corruption really helps GDP per Captia and how strong the correlation between the explanatory and response variables are lets closely analyze a linear regression between these two vairables.

Data Modeling & Prediction

Creating our Regression Models

While the log transformation gives us an easier way to see the upwards trend between Absence of Corruption Index and the nation’s GDP per capita, it actually does NOT help our linear model. In fact, comparing the computed R-squared values is worse on the log transformed model.

Code
# 2021 Data

full_2021_data <- full_data |> filter(Year == 2021)

# 2021 Data with log transformation

log_2021_data <- full_2021_data |> mutate(GDP_Per_Capita = log(GDP_Per_Capita))
Code
full_2021_data_lm <- lm(GDP_Per_Capita ~ Absence_Corruption_Index, full_2021_data)
log_2021_data_lm <- lm(GDP_Per_Capita ~ Absence_Corruption_Index, log_2021_data)
tibble("R Squared Non Transformed" = summary(full_2021_data_lm)$r.squared, "R Squared Log Transformed" = summary(log_2021_data_lm)$r.squared) |>
  reactable() |>
  add_title("Comparison of R Squared of Two Linear Models", font_size = 20, align = "center")

Comparison of R Squared of Two Linear Models

With that in mind, we will be sticking with our untransformed data for the remainder of this exploration.

Code
# Removes the transformed log models
remove(log_2021_data, log_2021_data_lm)

# Summary of Linear Model :
broom::tidy(full_2021_data_lm) |>
  rename("Regression Variable" = term, "Model Estimate" = estimate, "Standard Error" = std.error) |>
  select(c("Regression Variable":"Standard Error")) |>
  mutate_all(.funs = ~ prettyNum(.x, ",")) |>   reactable() |>
  add_title("Linear Model Stats", font_size = 20, align = "center")

Linear Model Stats

Our linear model comparing different countries in 2021 and their respective Absence of Corruption Index vs their 2021 GDP Per Capita yields us the following results.

\[ \beta_0 = -199747.304 \\ \beta_1 = 8211.695 \] which will give us the linear regression formula of : \[ \widehat{GDP\_Per\_Captia} = -199747.304+8211.695(Absence\_Corruption\_Index) \]

Analysis

With an \(R^2\) value of 0.50510, approximately 50% of our observed GDP per capita variability is accounted for in our correlation. For every score increase of 1 for a country’s Absence of Corruption Index, their GDP per capita will increase $8211.66.

Checking the Linear Model’s Accuracy

Although we have a mild \(R^2\) relation between Absence of Corruption Index and its response variable GDP per capita, we still want to check that the model fita. One way to check the average residual of our model is below.

Code
full_2021_data_lm_resd <- full_2021_data_lm |>
  broom::augment() |>
  select(.resid)
full_2021_data_lm_resd |>
  summarize("Avergae Residual" = mean(.resid)) |>
  reactable() |>
  add_title("Model Residual Stats", font_size = 20, align = "center")

Model Residual Stats

With a average residual \(<|10^{-9}|\), this looks pretty promising. However, to confirm our calculations and better visualize the distributions of residuals, lets plot them on a histogram. What we expect is a normal distribution of all residuals reminiscent of the familiar bell curve.

Code
options(scipen = 100)
full_2021_data_lm_resd |> ggplot(mapping = aes(x = .resid)) +
  geom_histogram(aes(y = after_stat(density)), bins = 50) +
  labs(x = "Residuals", y = "Percentage of Distrubution") +
  geom_density(colour = "steelblue") +
  theme(axis.title.y = element_text(angle = 0, vjust = 0.5))

Code
options(scipen = 0)

Another way we can also check our linear model is through the average variance or spread of our data set. Ideally we want our observed data to have lower variance as it means its more compactly distrubuted and easier for our model to predict

Code
# Variance Table for Observed, fitted and residual
full_2021_data_lm |>
  broom::augment() |>
  select(GDP_Per_Capita, .fitted, .resid) |>
  summarise(across(.cols = c(GDP_Per_Capita, .fitted, .resid), .fns = ~ var(x = .x))) |>
  rename("Observed" = GDP_Per_Capita, "Fitted" = .fitted, "Residual" = .resid) |>
  mutate_all(.funs = ~ prettyNum(.x, ",")) |>
  reactable() |>
  add_title("Model Variance Stats", font_size = 20, align = "center")

Model Variance Stats

Considering that we are working with thousands of dollars, a variance of 53038666343.0201 is alright, though its still quite far off our fitted (predicted model) variance of 26790021214.0854, so our model could be alot better. Only simulations will tell how truly accurate this model is.

Simulations and Model

Lastly lets perform a simulation on our data to see how our model performs compared to our observed data. In this simulation we will only be dealing with 2021 data as it has no NAs and represents the state of the world as it is currently in the early 21st century

Code
observed_2021_data <- full_2021_data |>
  select(GDP_Per_Capita) |>
  mutate(GDP_Per_Capita = as.numeric(GDP_Per_Capita)) |>
  pull()
predicted_2021_data <- full_2021_data_lm |> predict()

After grabbing our predicted and observed values for each country from 2021 we will introduce the standard error into our predicted data just to make it more realistic. Afterwards we will compared this modified simulated data to our gathered observed values. If our model is a good fit we should expect, a \(R^2\) value close tp 1 meaning that nearly all our (noise-induced) simulated data closely matches our observed data. We will then run 5000 trials and look at the minimum, average and maximum \(R^2\)

Code
generateRealisticPredictions <- function(predictedValues, standardError) {
  mutatedPredictedValues <- predictedValues + rnorm(length(predictedValues), 0, standardError)
  return(mutatedPredictedValues)
}

findModelFit <- function(linearModel) {
  newPredicted <- linearModel |> broom::augment()
  newPredicted <- newPredicted |>
    select(GDP_Per_Capita, .fitted) |>
    mutate(.fitted = generateRealisticPredictions(.fitted, sigma(linearModel)))
  observedPredictedLm <- lm(GDP_Per_Capita ~ .fitted, newPredicted)
  return(broom::glance(observedPredictedLm) |> select(r.squared) |> pull())
}
Code
set.seed(05081995)
sim_model_res <- map_dbl(.x = c(1:5000), .f = ~ findModelFit(full_2021_data_lm))
mean_sim_model_res <- mean(sim_model_res)
summary(sim_model_res) |>
  unclass() |>
  as.list() |>
  as.data.frame() |>
  pivot_longer(cols = c("Min.":"Max."), names_to = "Stat", values_to = "Values") |>
  mutate(Stat = str_remove(Stat, "[X]")) |>
  mutate(Stat = str_replace(Stat, "\\.", " ")) |>
  reactable() |>
  add_title("Model Residual Simulations Stats", font_size = 20, align = "center")

Model Residual Simulations Stats

Oh… it seems like our model was as good as we originally predicted. After standard error injection, only about 0.2575815 or 25.7581455 % of observed data can be modeled using our simulated data.

We can more visually see the spread of \(R^2\) we got from our 5000 simulations from plotting the \(R^2\) results on a histogram

Code
options(scipen = 100)
sim_model_res |>
  as.data.frame() |>
  ggplot(mapping = aes(x = sim_model_res)) +
  geom_histogram(aes(y = after_stat(density)), bins = 50) +
  ggtitle(bquote("Distrubution of " ~ R^2 ~ "between Normalized Simulated Data and Observed Data")) +
  labs(x = bquote(R^2), y = "Percentage of\n Distrubution") +
  geom_density(colour = "steelblue") +
  theme(axis.title.y = element_text(angle = 0, vjust = 0.5), plot.title = element_text(hjust = 0.5))

Addressing the NAs into consideration

Earlier we touched on the NAs present in our Absent of Corruption within our data, and listed some potential reasons for their absense. Now we will plot all entries that have NAs and their respective GDP per Capita for that year to verify if our original hypothesis is correct. We will also count the number of NAs from each region to see if we notice any trends between their regions and absence of any curruption information

Code
# Histogram of gpd-per capita by countries with N/A value for Corruption Index Value
na_data <- full_data |>
  filter(is.na(Absence_Corruption_Index))
Code
ggplot(na_data, aes(x = GDP_Per_Capita)) +
  geom_histogram() +
  scale_x_continuous(breaks = seq(min(na_data$GDP_Per_Capita), max(na_data$GDP_Per_Capita), by = 50000)) +
  labs(x = "GDP Per Capita", y = "Number of Countries", title = "GDP Per Capita For Countries Missing Corruption Index Values") +
  theme(axis.title.y = element_text(angle = 0, vjust = 0.5), plot.title = element_text(hjust = 0.5))

Code
country_region <- full_2021_data |> select(country, region)
na_data_rich <- na_data |> filter(GDP_Per_Capita >= 50000)
na_after_soviet <- na_data |> filter(Year > 1991)
na_data |>
  group_by(country) |>
  summarise(NumbersOfNAEnrties = n()) |>
  inner_join(country_region, by = "country") |>
  slice_max(order_by = NumbersOfNAEnrties, n = 10, with_ties = FALSE) |>
  rename("Number of NA Enrties" = NumbersOfNAEnrties, "Region" = region, "Country" = country) |>
  reactable() |>
  add_title("Number of NAs from Countries", font_size = 20, align = "center") |>
  add_title("Ordered by Number of NAs", font_size = 12, align = "center")

Number of NAs from Countries

Ordered by Number of NAs

Code
na_after_soviet |>
  group_by(country) |>
  summarise(NumbersOfNAEnrties = n()) |>
  inner_join(country_region, by = "country") |>
  slice_max(order_by = NumbersOfNAEnrties, n = 10, with_ties = FALSE) |>
  rename("Number of NA Enrties" = NumbersOfNAEnrties, "Region" = region, "Country" = country) |>
  reactable() |>
  add_title("Number of NAs from Countries\n Post Collapse of the Soviet Union: 1991", font_size = 20, align = "center") |>
  add_title("Ordered by Number of NAs", font_size = 12, align = "center")

Number of NAs from Countries Post Collapse of the Soviet Union: 1991

Ordered by Number of NAs

Code
na_data_rich |>
  group_by(country) |>
  summarise(NumbersOfNAEnrties = n()) |>
  inner_join(country_region, by = "country") |>
  slice_max(order_by = NumbersOfNAEnrties, n = 10, with_ties = FALSE) |>
  rename("Number of NA Enrties" = NumbersOfNAEnrties, "Region" = region, "Country" = country) |>
  reactable() |>
  add_title("Number of NAs from Countries \n w GDP Per Capita over 50K USD", font_size = 20, align = "center") |>
  add_title("Ordered by Number of NAs", font_size = 12, align = "center")

Number of NAs from Countries w GDP Per Capita over 50K USD

Ordered by Number of NAs

The above histogram depicts the distribution of GPD’s Per Capita amongst countries that have missing values (N/A) for the absence of corruption index. This histogram includes all years from the original data set, 1975-2021, and all countries from this grouping are included, to better depict the distribution. The graph shows that there is a very large amount of countries, around 120, that have very low GDP’s per capita (around the several thousand dollars range) and do not contain values for absence of corruption. Aside from this, there appears to be a slightly right-skewed distribution centered at about 150,000, containing the remainder of the data from our sub-set group. Additionally, the max of this right tail hovers around 280,000. Some of this variation could be explained that less fortunate countries has less resources to measure these indexes. Additionally we see that many of these counties are “newer” countries, and only have formed in the past 30 years(Rosenberg 2019) number particularly from previously Soviet states. That being said, the large number of NAs from poorer countries, still possibly suggests that there leaders and governments are deliberate action taken to attempt to hide internal corruption, maybe spending the countries resources for their own benefit instead of reform and social welfare programs. The smaller but still significant “clump” of NAs on the right primary are composed of 2 groups, 1: rich monarchies trying to hide their powers like some in the Middle East as discussed earlier or 2: more present in this data, Eastern Europe nations particularly the Baltics(Åslund 2015) and to a lesser extent Balkans getting really rich as demonstrated in the 3rd table above which shows NA values with GDP per Captia over \(\$50,000\) , \(\frac{9}{10}\) of the countries were of European countries and half of them are now part of the EU(Union 2023), therefore, overall in many regards, the NAs should not be a major issue we need to contend with. For those countries that have been established for a while, a missing NAs does raise quite a suspicion of what their government are up to.

Conclsuion

Overall after all our analysis and models we still believe that an country’s absence of corruption index still has some role in shaping the GDP per Captia of a country. While not as strong and cohesive as we originally hypothesis, the overall however corruption among weather nations does still have us think it the absence of corruption index have a role in influencing ones economy. Nonetheless if this data has shown us anything, it is the fact that there’s no one factor that determines one country’s GDP per Captia, geography, access to natural resources, population, labor force, political structure and stability all contribute in shaping the GDP Per Captia.

References

Åslund, Anders. 2015. “The Baltic Tigers: Past, Present and Future.” CESifo Forum 4. https://www.cesifo.org/DocDL/forum-2015-4-aslund-baltic-tiger-december.pdf.
Gapminder. 2023. “_IDEA-Democracy Indices Data - V5.” Gapminder; Gapminder. https://docs.google.com/spreadsheets/d/1jYUZFQOQrE0bAjV9XVgr_92nMT-_ukYBs4Uom4rfVtQ/edit#gid=501532268&range=B17.
Ingabire, Diane. 2020. “Gapminder Documentation 001 -GDP Per Capita, Constant PPP Dollars V26.” Gapminder.org; Gapminder Foundation. https://s3.eu-west-1.amazonaws.com/static.gapminder.org/GapminderMedia/wp-uploads/20210512195520/Documentation_-GDP-per-Capita-v26.pdf.
Rosenberg, By Matt. 2019. “The World’s Newest Countries Since 1990.” ThoughtCo. https://www.thoughtco.com/new-countries-of-the-world-1433444.
Union, The European. 2023. “Easy to Read - the European Union.” european-union.europa.eu. https://european-union.europa.eu/easy-read_en.